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1. Introduction 


Abstract 
The scientific machine learning (SciML) field has introduced a new class of models called 


physics-informed neural networks (PINNs). These models incorporate domain-specific 
knowledge as soft constraints on a loss function and use machine learning techniques to train 
the model. Although PINN models have shown promising results for simple problems, they 
are prone to failure when moderate level of complexities are added to the problems. We 
demonstrate that the existing baseline models, in particular PINN and evolutionary sampling 
(Evo), are unable to capture the solution to differential equations with convection, reaction, 
and diffusion operators when the imposed initial condition is non-trivial. We then propose a 
promising solution to address these types of failure modes. This approach involves coupling 
Curriculum learning with the baseline models, where the network first trains on PDEs with 
simple initial conditions and is progressively exposed to more complex initial conditions. Our 
results show that we can reduce the error by | — 2 orders of magnitude with our proposed 
method compared to regular PINN and Evo. 

Keywords: Scientific machine learning PINN, soft-regularization, multiphysics modeling, 


chemical engineering PDEs, 


Partial differential equations (PDEs) are frequently adopted to 
explain various occurrences in realms of science and 
engineering, generally founded on fundamental laws such as 
the conservation of mass or energy. Typically, finding 
analytical solutions to these PDEs for many real-world settings 
is not trivial and, in some cases, not feasible. Many 
conventional approaches have been proposed and _ studied 
throughout years ,e.g., finite element methods (FEM) [1], 
Gradient Discretization method [2], Spectral method [3], etc. 
to approximate the solution to PDEs numerically. However, 
these solutions can be computationally expensive since they 
involve discretizing the problem domain into a grid and 
updating the solution at each grid point. This can require many 
calculations and iterations, especially for complex problems 


such as turbulence simulations [4]. For this reason, as well as 
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the availability enormous data in scientific and engineering 
domains, there has been an increasing interest in developing 
machine learning (ML)/Deep Learning (DP) methods to solve 
complex partial differential equations or complement 
numerical solutions. Thus, the area of Scientific Machine 
Learning (SciML) has emerged, integrating traditional 
scientific models based on differential equations with data- 
driven ML techniques, such as neural network training. 

One of these methods is the so-called physics-informed neural 
networks (PINNs) [5-9]. The basic idea of PINNs for solving a 
forward PDE is to train a neural network to minimize errors 
with respect to the solution provided at initial/boundary points 
of a spatiotemporal domain, as well as the PDE residuals 
observed over a sample of interior points, referred to as 


collocation points. Due to the capability of PINNs to 
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incorporate physical laws and their ability to provide a flexible 
structure for the solution PDEs, they have been extensively 
utilized for the multiphysics modeling of systems in the field 
of chemical engineering. For example, PINNs have been 
adopted to model the systems related to heat transfer [10-12], 
compressible and incompressible flows [13-18], convection, 
reaction, and advection-diffusion systems [19-25]. The 
applications of the PINN method have also been extended to 
study of environmental and materials engineering systems, 
such as, mitigation of carbon emissions [26-29], and prediction 
of materials properties [30-32]. 

Despite the advantages PINNs offer, several recent studies 
show that training PINNs can be quite challenging for 
complicated systems [33-36]. In general, PINNs try to 
leverage the power of deep neural networks to learn the 
behavior of complex systems while respecting the underlying 
physical laws. This is achieved by incorporating the governing 
equations or physical laws as a soft constraint on the loss 
function, that is then minimized using ML techniques. 
However, solving the optimization problem may not be 
straightforward as the imposed physical term in the loss 
function often involves nonlinearities that cause the loss 
function to be ill-conditioned [37-39]. Several works propose 
novel methods to tackle the challenges of training PINNs [40- 
42]. One early work identifies a mode of failure of PINNs due 
to the existence of unbalanced gradients during training and 
proposes an adaptive model that utilizes gradient statistics to 
assign appropriate weights to different terms in the PINNs 
composite loss function [43]. Karishnapriyan et al. [33] 
describes curriculum regularization and sequence-to-sequence 
(seq2seq) learning as two promising solutions to address 
failure modes associated with large PDE coefficients. The 
importance of sampling strategies on the performance of 
PINNs has been the focus of many researchers. Subramanian 
et al. [35] argues that the location of collocation points greatly 
influences the trainability of PINNs, motivating the 
development of an adaptive collocation scheme _ that 
progressively accumulates more collocation points around 


areas where the model yields higher errors. In another recent 
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line of work [43], it was shown that the PINN’s performance 
depends on the successful propagation of solution for 
boundary/initial points to the interior points. To mitigate the 
“propagation failure” they proposed the so-called evolutionary 
sampling (Evo) strategy, where collocation points evolve over 
training iterations to prioritize high-density regions. In contrast, 
the work of Wang et al. [44] demonstrates the rapid transition 
in the transition layer as the cause of failure and introduces a 
curriculum-based approach that encourages neural networks to 
prioritize the learning on easier non-layer regions. 

In all the aforementioned literature, the initial conditions are 
assumed to be fixed during training and PINN has to be 
retrained for problems with different initial conditions. 
Conventional methods for handling the complex initial 
conditions face significant challenges, as they typically require 
a fine resolution to capture the steep gradients. In such cases, 
the traditional methods often lead to extensive computational 
costs and numerical instabilities. For example, in the coating 
process of semiconductors and MEMS devices, the final 
thickness of resist film is predicted using numerical 
simulations based on the governing equations of Liquid film 
flows [45, 46]. These problems are highly sensitive to the 
initial condition and performing multiple simulations for 
different initial conditions is not cost-effective. Problems 
related to fluid dynamics and heat transfer, such as high-speed 
aerodynamic flows [47], biomedical flows [48], and estimation 
of air pollution in a spatiotemporal domain [49], are mostly 
modeled using convection, diffusion, and reaction PDEs. 
Changing the initial condition when training PINN for these 
problems may lead to a significant deviation of the predicted 
solution from the ground truth. Therefore, it is critical to 
investigate new approaches to improve the robustness of the 
model against variation in initial condition. Motivated by this, 
we propose to combine the existing baseline models “PINN” 
and “Evo” [43] with “Curriculum Regularization” [33], 
where the neural network first trains with easier initial 
conditions and progressively approaches the target initial 


condition, which could be hard to optimize from the beginning. 


2. Methods 
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The formulation of PINNs starts with constructing a neural 
network f g (x, t) to deduce the solution U of a non-linear 
partial differential equation [43]: 


u,t+N,lul=AxeEX,teE[OT]; u(x,d =h(x), x 
EX; u(x,t) =g(x,t), €t[0,T], x 
E 0X 


Here, JV, is non-linear spatial operator, x and t denotes space 


and time, respectively, OX is boundary of spatial domain, and 
T is the time horizon. h(x) is the initial condition and 
g(X,t) is the boundary condition. To solve the PDE, we first 


need to compute the residual function Ry(x,t) and the 
corresponding loss function C,.(0) on a set of collocation 
points {x,,t;, al . sampled from a uniformly from the entire 


spatio-temporal domain (Q = X x [0, T]). 


) 
Ra (x,t) =a ere") tN, fe. 01, (1) 
1 N 
C.(0) =— > [Rg (x,t) PF. (2) 
N jet 
where N_ is the number of collocation points. PINNs 


approximate the solution of given PDE by minimizing the 
overall mean-squared losses consists of C(@) = A,C,.(@) + 
MicCic(O) + Apo Cp-(@) . The subscripts “r“, “ic”, and 
“bc” corresponds to the residual, initial condition, and 
boundary condition, respectively. The hyperparameter A 
signifies the importance of each loss term on the overall loss 


function. Please note that the main complication in training 


PINNs arise from the existence of the differential operator in 


C(@), which causes the loss function to be ill-conditioned. 
This is very different from norm based LZ; and Lz 
regularizations where the regularization operator corresponds 
to a simple convex function. 

2.1. Convection System 

We consider a one-dimensional convection problem with 


the following governing equation 
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xEX,tEe[OT]. (3) 
where B is the convection coefficient. The initial and 
periodic boundary conditions are as follows: 


h(x) =sin (ax #km), 


u(0,t) =u(27,t). 


(4) 


is the rate of change of the function and k is the phase. 


The general loss function can be obtained as follow: 


pies ~ 
C(0) =— >, (Aj, — W(X)? 
N j= 


du du (5) 
A.(— —/ 
% Oe tg? 


+A, (u(0, t) — u(2n, t))2). 


where u = f g(t ) presents the neural network’s output. 


2.2. Reaction-Diffusion System 

The one-dimensional reaction-diffusion problem can be 

described using the following governing equation: 

oe oY pudt—w =o, xEex,t (6) 

€(0,T/] 

where VD > 0 and p are the diffusion and reaction coefficients, 

respectively. We consider Gaussian distribution for the initial 

condition and a periodic boundary condition: 


ale =i 2 


hog =e Sn, @ 
u(0,t) =u(27,t). 


Here, : (= ©) is the standard deviation and 7) is a constant 


used to scale 0. The overall loss function for this problem is 


given as: 


re 
C(0) =—¥ [A,, (u-h(x,)? 
N jz1 
ox? 


+A,,(u(0,t) — u(2m, t))7]. 


+ 2, (S- —pu(l-wy 8) 
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Table 1. Hyperparameters applied in experiments on different PDE system. 


PDE Method Ap [Af By Ir.scheduler 
PINN 
PINN + Curriculum V1 ‘2 
Convection 7 StepLR 
Hive a neecule? 1/100/100 Rate = 0.9 
Steps = 5000 
: : F PINN 
Reaction-Diffusion PINE: Cueuhan 1/100/100 No 


2.3. Experiment Setup 

We first perform experiments on the time-dependent 
convection system using four different baselines: “PINN”, 
“PINN + Curriculum”, “Evo”, and “Evo + Curriculum”. We 
consider two different convection coefficients (B = 5 and B = 
15) and the initial condition sin (ax +k) with @ ranging 
from 1 to 5 and k ranging from 0 to 0.5. We then study the 
reaction-diffusion problem (vb = p =3) using “PINN” and 
“PINN + Curriculum” baselines. The 7 parameter in the initial 
condition in equation (7) varies from 2 to 8. The neural 
network architecture consists of four fully connected layers 
with 50 neurons per layer and a hyperbolic tangent activation 
function. For all cases, we use a periodic boundary condition 
and Adam optimizer with a learning rate /r = le-3. After 
training the models, we obtain the Zz absolute error between 
the predicted result and the analytical solution (ground truth) 


as: 


1c0N ,- 
Absolute error = Fy dizollu —ullz (9) 


where U is the exact solution and u is the output of 

NN. The number of collocation points is kept constant at N = 
1000 over the whole domain Q (nm, x m= 512 x 256). Note 
that nx and m denotes the number of grid points in spatial and 
temporal domains, respectively. Table 1 provides a list of other 
hyperparameter settings for different baseline method. 


3. Results and discussion 
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The concept behind curriculum learning [50], inspired by 
human education, is to start optimizing the problem with easier 
training criterion and gradually increase the level of difficulty 
over time. Curriculum learning has been shown to improve 
various ML models for different applications including PINNs 
for solving PDEs with large coefficients. Motivated by this, we 
propose to implement Curriculum learning in predicting the 
solution to problems with complex initial conditions. The main 
idea is to train the base model using a simple initial condition 
and progressively transition to a more difficult initial condition 
after a certain number of iterations. This way, the model has 
the opportunity to learn the easier constraint and construct a 
solid foundation for learning the target constraint. 

For the one-dimensional convection problem with an initial 
condition given in Eq. 4 (Sin (ax +k1)), the baseline PINN 
model is trained with and without Curriculum learning for 
different values of the constants in the initial condition. After 
training, we measured the absolute errors between the 
analytical and predicted solution using Eq. 9. Please note that 
the data points associated to PINN (denoted with hollow 
circles) in Figurel are obtained by running the code for 2.5 x 
104 iterations for each distinct value of the specified constant 
(a and k), whereas the data points associated to PINN + 
Curriculum are obtained for that many iterations over the 
whole range of the initial condition constant. For example, at 
k = 0 in Figure 1(a), PINN trains for 2.5 x 10 iterations, but 
Curriculum learning trains for 5 x 10%. This clearly grants an 


unfair advantage to the “Vanilla” PINN model. 
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Figure 1. Variation of Absolute error with initial condition parameters for “PINN” and “PINN + Curriculum” models. The PDE 
= 1, (6) B =5, k=0, (c) B =5, k=0.5, and (d) B = 15, k=0. 
Exact PINN + Curriculum 
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Figure 2. Predicting the solution to a 1D convection problem using “PINN” and “PINN + Curriculum” baseline models. The 


other parameters are (a) B = 5, & = 5 and (b) B = 15, a = 5. k= 0 for both cases 
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The variation of absolute error as a function of phase angle k 


for a convection system with B = 5 and a = 5 is depicted in 


Figure l(a). As one can see, the error corresponding to the 


model with Curriculum learning decreases with increasing k 


while the error rises for the PINN model. Although the value 
of k influences the trainability of the network, its effect is not 
as notable as the effect of angular frequency @. Figures 1(b) & 
(c) show the trends of absolute error with @ increasing from 1 


to 5, when k = 0 & 0.5, respectively. As expected, the 


Curriculum model outperforms the PINN model for complex 
initial conditions (large values of @). The reason that PINN 
delivers better performance for small values of @ is the unfair 


advantage it has over Curriculum. For example, at a = 1, 


(a) 


tol 
iS) 
= 
aa 
2 
a 
Z 
oO 
< 
~a-7? [=o PINN | 
ae —#- PINN + Curriculum 
10° i 4 4 1 it 1 4 1 
4 6 8 
UT 
"aa 0.2 04 0.6 0 


t 


(b) 


PINN trains for 2.5 x 104, but Curriculum only trains for 5 x 
10°; hence, obtaining lower error for PINN. The predicted 
solutions of these models as well as the exact solution when B 
= 5, a =5, and k = 0 are reported in Figure 2(a). It can be 
seen that, unlike PINN, Curriculum method successfully 
captures the solution on the entire spatiotemporal domain. To 
test the robustness of our proposed model, we even add further 
complexity to the problem by increasing the convection 
strength to B = 15. Similar to the previous case, Curriculum 
learning notably improves the performance as the absolute 


error drops by around 2 orders of magnitude for intermediate 


values of @ and 1 order of magnitude for @ = 5, see Figure 


1(d). 
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Figure 3. (a) Variation of Absolute error with initial condition parameters. (b) The exact solution and predicted solution using (c) 


“Evo”, and (d) “Evo + Curriculum” baseline models to a 1D reaction-diffusion problem with v = 9 =3 andn =8. 
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The visualizations of the predicted solutions in Figure 2(b) 
show that the “Vanilla” PINN model clearly fails at learning 
the solution; however, combining PINN and Curriculum 
learning results in an accurate prediction. 

We next look at a one-dimensional reaction-diffusion flow 
with a Gaussian initial condition, see Eq. 7. Here, we consider 
four different values for 7) ranging from 2 to 8. In general, as 7 
increases the diffusivity of the flow rises, making the problem 
more difficult to learn. The variation of absolute error with 7 
is illustrated in Figure 3(a) for a case with VD = p = 3. One can 
clearly see that at the extreme case 7) = 8, the Curriculum 
learning lowers the absolute error of PINN from 1.96 x 10°! to 
8.94 x 10-3, improving it by almost 2 orders of magnitude. We 
also show the predicted solutions and the ground truth solution 
obtained using analytical methods in Figures 3(b) — (d). It is 
obvious that the PINN model is uncapable of predicting the 


reaction or the diffusion components. However, by first 


(a) 
t 
° 
= 
sa) 
2 
= 
° 
D 
Oo 
< 
-9® 
re -@ Evo 
o~ —#-Evo + Curriculum 
193 He z Soo 
1 2 3 4 5 
a 
(c) 
6 1 


J 
j 


0.8 1.0 


0 f if = i= 
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exposing the network to the easier problem (7 = 2) and 


gradually increasing 1, we were able to capture the solution in 
the whole domain. 

Inspired by algorithms used for biological evolution, this 
iterative sampling strategy was developed to address the 
propagation failure of collocation points when solving a PDE 
having high residuals in very narrow regions. In this method, 
we first generate collocation points through a uniform 
distribution. Then, throughout each subsequent iteration, we 
retain collocation points whose absolute value of its PDE 
residual is greater than a predefined threshold and resample the 
remainder points from a uniform distribution. We finally 
merge the resampled population with the retained population 
to create the population for the next iteration. The main idea 
behind Evo sampling is to include more collocation points 
from high PDE residual regions to embolden the 


representation of these regions in the overall residual loss. 


0.0 02 04 06 O08 1.0 


0 . =] 
0.0 0.2 04 06 O08 1.0 
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Figure 4. (a) Variation of Absolute error with initial condition parameters. (b) The exact solution and predicted solution using (c) 


“Evo”, and (d) “Evo + Curriculum” baseline models to a 1D convection problem with B = 15, a =5, k =0. 
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Here, we again consider a one-dimensional convection 


problem with 6B = 15 and k =0. We examined this scheme by 
utilizing the identical NN architecture as before. Compared to 
PINNs, Evo sampling methods require a higher number of 
iterations to converge, therefore, we trained the model for 10° 
epochs (four times the number of epochs for the PINN method). 
Similar to the previous section, we give an advantage to the 
Evo method over Evo + Curriculum method in terms of 
number of iterations. Figure 4(a) shows that employing 


Curriculum learning on top of Evo sampling causes the 


absolute error to drop for larger values of angular frequency @. 
Moreover, the exact and predicted solutions are depicted in 
Figures 4(b) — (d). We observe that Evo + Curriculum learning 
method, unlike Evo sampling, successfully captures the 
solution in the entire domain. Comparing PINN + Curriculum 
and Evo + Curriculum methods, we see that the latter produces 
slightly more accurate results, however, it should be noted that 
this comparison is biased in favor of Evo + Curriculum method 
since the total number of iterations for this method is 
considerably higher. In general, Evo is proven to outperform 
PINN when trained for a relatively high number of epochs, 


especially in the case of PDEs with very large coefficients (e.g., 
convection equation with B > 30) 

4. Conclusion 

SciML models, more specifically Physics-informed neural 
networks, present an exciting opportunity to extend the use of 
ML techniques to tackle a variety of scientific and engineering 
problems. However, incorporating ML approaches with PDE- 
based constraints served as a soft regularization term can result 
in failure modes that prevent the learning of fundamental 
physics governing a problem. We studied one-dimensional 
convection and reaction-diffusion problems and showed that 
the “Vanilla” PINN and Evo sampling models are unable to 
predict the solutions to these problems when we impose a non- 
trivial initial condition. We proposed implementing 
Curriculum learning where the baseline model trains on simple 
initial conditions before being exposed to the complex target 
initial condition. We showed this approach lowers the absolute 


error by | — 2 orders of magnitude and can successfully 
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capture the solution to the PDEs. Addressing the limitations 
associated with SciML models will be crucial if we hope to 
build a closer integration between scientific theories and 
Machine Learning formulations. 
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